However, one thing that was clear to me whilst at this year’s #infosec13 was a lack of understanding of what is meant by Data Classification. There are a number of definitions to Data Classification provided by Wikipedia which are: Data classification (data management) Data classification (business intelligence) Classification (machine learning) , classification of data using machine learning algorithms Assigning a level of sensitivity to classified information If you take a look at these you will see that there is relevance to all of it but at Boldon James we want to draw your attention to the fourth point ‘Assigning a level of sensitivity to classified information’ and more specifically: ‘Some corporations and non-government organizations also assign sensitive information to multiple levels of protection, either from a desire to protect trade secrets, or because of laws and regulations governing various matters such as personal privacy, sealed legal proceedings and the timing of financial information releases
Obviously, the results do not only apply to the use of Machine Learning in eDiscovery, but also to Legacy Data Clean-up, Defensible Disposition, and Automatic data classification in records management, enterprise information archiving and intelligent data migration from legacy systems to for instance the cloud
Because of this the demand is not so much on data extraction, but data classification. In both use cases the adoption short comings have been: Regulation: HIPPA compliance and the like are a benefit to the sustainability of these technologies, but not to their adoption